fix: GC orphan summaries + WASM savepoint nesting [WIP 4/5] by andreinknv · Pull Request #6 · andreinknv/codegraph

andreinknv · 2026-05-01T11:21:25Z

Stress-test fixes — PR 4 of 5 (stacked). Base: pr3-split-provider-and-summary-batching.

Commits

1e5fa19 fix: GC orphan summary rows after each summarize pass
9adac63 fix: prune orphan directory_summaries; WASM adapter savepoint nesting

Summary

Symbol-summary GC (`1e5fa19`)

Surfaced after running --force with two different chat models on the same repo: symbol_summaries held 723 rows — 675 fresh + 48 stale from the prior model whose node_ids no longer exist. Cache-preservation in PR 2 left them behind; the by-content-hash fallback didn't reclaim them onto current node_ids.

pruneOrphanSummaries() runs DELETE FROM symbol_summaries WHERE node_id NOT IN (SELECT id FROM nodes) inside a transaction. Embedding rows cascade-delete via existing FK. Wired AFTER summarizeAll (so the content-hash reclaim has had its chance) and BEFORE embedding (so embeddings only run on live rows).

Directory-summary GC (`9adac63`)

Same issue for directory_summaries. Live-dir set computed via path.posix.dirname(filePath) — exactly mirroring what dir-summarizer.ts writes. Crucially does NOT walk the ancestor chain (the first draft did, which let stale intermediate-dir summaries survive — e.g. a src summary from when files lived directly in src/ but have since all moved into src/core/). Regression-tested.

WASM savepoint nesting (`9adac63`)

WasmDatabaseAdapter was using bare BEGIN/COMMIT for nested transactions, which sqlite-wasm rejects. Switched to savepoint-based nesting. Also exported the adapter from the package index so callers can construct it directly (the only way to exercise WASM on Mac, where createDatabase always picks better-sqlite3).

Test plan

Suite green at 1113 / 13 skip / 0 fail.
New tests in __tests__/migrations-022.test.ts cover live-preserved + orphan-pruned + ancestor-only-pruned scenarios.

WIP draft.

Surfaced by end-to-end measurement: after running `--force` with two different chat models (Phase A-MLX with qwen3-coder, then Phase A bridge+batched with claude-haiku-4-5), the codegraph self-repo's symbol_summaries table held 723 rows — 675 fresh haiku entries plus 48 stale qwen3-coder entries whose node_ids no longer exist in the current `nodes` table. The orphans accumulated because the cache- preservation work in bd26945 (clearStructural with FK toggle) left summary rows behind on every --force, and rows that the by-content- hash fallback in summarizeAll didn't reclaim onto a current node_id were never cleaned up. Bookkeeping noise (inflated row counts, misleading Summary coverage metrics) and slow content_hash index growth — no logical bug, but worth fixing while the design is fresh. ## Implementation New `QueryBuilder.pruneOrphanSummaries()` runs: DELETE FROM symbol_summaries WHERE node_id NOT IN (SELECT id FROM nodes) inside a transaction, with before/after row counts captured atomically in the same transaction. Returns `{ summariesDeleted, embeddingsDeleted }`. Embedding rows cascade-delete via the existing FK CASCADE on `symbol_embeddings.node_id REFERENCES symbol_summaries(node_id)`. Wired into `startBackgroundSummarization` to run after Phase 1 (summarize) completes, before Phase 2 (embed). Logged via logDebug only when something was actually deleted. ## Timing rationale MUST run AFTER summarizeAll completes — by then any genuinely relocated summaries (function moved files, line shifted, extraction- version drift) have been re-attached to their new node_ids via the content-hash fallback inside summarizeAll. Calling DURING summarizeAll would race the reclaim and lose cache hits. NOT called in no-chat configurations: orphans there might still be legitimately reclaimable by a FUTURE chat-enabled session via the content-hash fallback. Pruning prematurely would force the next chat session to regenerate from scratch. Acceptable cost (storage growth in configs that never gain a chat backend) for correctness in the more common case. Documented in the method JSDoc. ## Tests `__tests__/migrations-022.test.ts` — 2 new tests: - `pruneOrphanSummaries deletes only summaries whose node_id is gone` — sets up 3 nodes + 3 summaries + 2 embeddings, simulates clearStructural by toggling FKs OFF and DELETing 2 nodes (the exact post-clearStructural-then-reindex state), then prunes. Asserts: 2 summaries deleted, 1 embedding cascade-deleted, 1 live summary + 1 live embedding survive. - `pruneOrphanSummaries is a no-op when nothing is orphaned` — asserts zero deletions and zero state change when all summaries have matching nodes (idempotent on healthy DB). Suite: 1095 / 13 skip / 0 fail (was 1093). ## Reviewer trail Two passes. Pass 1: REQUEST_CHANGES + 1 substantive finding (the four COUNT(*) reads were taken outside the DELETE transaction, leaving an inter-process race window where the embeddingsDeleted count could be wrong) + 2 info findings (directory_summaries scope, no-chat limitation). Substantive fix: wrapped all four counts AND the DELETE inside a single this.db.transaction(() => { ... return obj; })() so the snapshot is atomic. Info finding 3 addressed by adding a JSDoc paragraph documenting the no-chat-session reasoning. Pass 2: APPROVE + 1 info finding (the WASM adapter doesn't support nested transactions — not actionable here since pruneOrphanSummaries is never called from another transaction). ## Out of scope (deferred to follow-up) - `directory_summaries` rows for deleted directories accumulate similarly. Different keying (dir_path, not node_id) means a different prune query. Worth a future PR. - WASM adapter savepoint support — the WasmDatabaseAdapter.transaction() uses bare BEGIN/COMMIT, no savepoints. Future callers that nest pruneOrphanSummaries inside another transaction would hit a SQLite "cannot start a transaction within a transaction" error on WASM only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the two follow-ups noted in 1e5fa19's commit message: ## directory_summaries orphan pruning Same issue `pruneOrphanSummaries` fixed for symbol_summaries, but for directory_summaries — when a dir is deleted/renamed/excluded its summary lingers indefinitely (no FK anchor; dir_path is the PK). `QueryBuilder.pruneOrphanDirectorySummaries()`: - Reads all `files.path` rows. - For each file, computes the IMMEDIATE PARENT directory via `path.posix.dirname(filePath.replace(/\\/g, '/'))` — exactly mirroring what `dir-summarizer.ts:groupByDir` writes. Crucially does NOT walk the ancestor chain (the first draft did, which would let stale intermediate-dir summaries survive — e.g. a `src` summary from when files lived directly in `src/` but have since all moved into `src/core/`). - Inserts the live set into a `TEMP TABLE _live_dirs` and runs `DELETE FROM directory_summaries WHERE dir_path NOT IN (...)`. - All four COUNT(*) reads + the DELETE share one transaction so the before/after snapshot is atomic (same race fix as `pruneOrphanSummaries`). - Returns `{ directorySummariesDeleted }`. Wired into `startBackgroundSummarization` to run after Phase 3 (dir summarizer). Same timing rationale as symbol prune: must run AFTER the writer has had its chance to refresh stale entries, or we'd race its writes for newly-summarised dirs. ### Tests in `__tests__/migrations-022.test.ts` (3 new) - Live dirs preserved, orphan dirs pruned (mixed scenario with 4 indexed files spanning src/a, src/b, and src directly). - Empty files table prunes everything (degenerate case). - **Stale ancestor-only dir IS pruned** — regression test for the reviewer-caught under-pruning bug. Indexes only `src/core/foo.ts`, plants summaries for both `src/core` (live) and `src` (stale ancestor-only). Asserts only `src/core` survives. Pre-fix this would have left `src` alive due to the ancestor walk. ## WASM adapter savepoint nesting `node-sqlite3-wasm` doesn't transparently handle nested transactions like better-sqlite3 does. Pre-fix, a caller wrapping a `transaction()` inside another `transaction()` would get "cannot start a transaction within a transaction" — only on the WASM backend, silently OK on native. The reviewer flagged this in 1e5fa19's pass 1 as out-of-scope. `WasmDatabaseAdapter.transaction()` in `src/db/sqlite-adapter.ts`: - Added `private _txDepth = 0` field tracking nesting depth on the connection. - Outer call (depth 0) → BEGIN/COMMIT/ROLLBACK. - Inner call (depth > 0) → SAVEPOINT/RELEASE/ROLLBACK TO + RELEASE. - Savepoint name `cg_sp_${depth}` is depth-tagged so concurrent depth values produce stable distinct names; safe to inline (depth is a non-negative integer, no SQL-injection risk). - Rollback paths wrapped in try/catch — if the rollback itself fails, the original error is annotated with a "(NOTE: ROLLBACK also failed: ...)" suffix before re-throw. Operators reading the log get a signal when connection state may be uncertain. - `finally { this._txDepth-- }` restores depth on both success and failure paths. `WasmDatabaseAdapter` is now exported (was un-exported `class`) so tests can drive it directly on machines where better-sqlite3 is installed and `createDatabase` would otherwise pick native. Brief JSDoc on the export explains the test-only intent. ### Tests in `__tests__/wasm-savepoint.test.ts` (NEW, 8 tests) - Single-level commit - Nested commit (outer + inner both succeed) - Inner throw + outer recovers (writes survive even though inner's SAVEPOINT was rolled back) — this is THE case that errored pre-fix. - Outer throw → full rollback (zero rows survive) - Depth counter resets on success AND failure paths (third-run-after- failure case proves the `finally` restores depth on both branches). - Three-deep nesting with L3 failure rolled back inside L2. - transaction() return value passes through. - transaction() args forwarded to wrapped function. Suite: 1106 / 13 skip / 0 fail (was 1095, +11 new tests). ## Reviewer trail Two passes. Pass 1: REQUEST_CHANGES + 1 substantive (under-pruning ancestor-walk bug — would let stale `src` summary survive when files all moved into `src/core`) + 1 info (silently swallowed outer-rollback errors). Substantive: rewrote prune to use immediate-parent only + added regression test that would have failed pre-fix. Info: annotate caught rollback errors onto the original exception message. Pass 2: REQUEST_CHANGES + 1 finding (stale JSDoc still described the removed ancestor-walk behavior) + 1 info (frozen-Error edge case in error mutation — theoretical, not actionable). JSDoc fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

andreinknv · 2026-05-01T11:31:11Z

Stack collapsed into wip-stress-base (now at 56fac32). Commits landed via the fast-forward of wip-stress-base from df96c6c to 56fac32; #3 shows as merged, #4-#7 are closed because their tips are already ancestors of the base. Branch preserved for reference.

… server-config flags Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed: #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe) #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*` #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)` colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed` CLI/MCP alignment (5/32 → 33+/35): - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar, biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs, module-summary, role, coverage-query, pending-summaries, save-summaries, review-context - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync, codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit, codegraph_unlock, codegraph_affected MCP server-level operator config (`codegraph serve --mcp`): - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing) - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model / --llm-api-key (operator LLM config; per-project config wins on conflict) - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders Architectural cleanups: - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces growing string-comparison chain in execute()) - `withAllowStale` registry injection only on tools that DO see the gate - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts` - `LlmClient.isEmbeddingReachable` for split-provider correctness - SyncResult `lockContention` flag → handleSync emits distinct retryable message - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force) - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs) Migrations: 023-files-is-test.ts — add `files.is_test` 024-build-context-refs.ts — add `build_context_refs` table Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes: - JSONC URL strip via state machine (was eating `https://` tails) - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent) - Dynamic imports also match template_string nodes - process.exit deferred until after finally cleanup in runViaMCP - --same-language / --different-language mutual exclusion guard - help-tools CLI bypasses isInitialized (works without a project) - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks) - handleAffected errors instead of silently dropping unsupported glob filters - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After the structural diff lands, the natural next question is "did my edits introduce or clean up any biomarker findings?". Pre-PR the agent had to read the persisted index AND remember the baseline state — neither of which travels well across an edit-loop. This extension answers it from a git ref + in-memory analysis only, no fresh sync required. How it works ------------ For each file already extracted by compareToRef, run the existing per-file biomarker engine (parseSource + findNodeInTree + computeMetrics + evaluateRules) on BOTH sides. Diff findings by (qualifiedName, kind, biomarker): - only-in-after → added (new regression) - only-in-before → cleared (refactored away) - both → carried (status quo, surfaced as a count only so the report doesn't double in size) Cross-file biomarkers (unused_export, god_class, feature_envy) are out of scope: they need the whole graph at the baseline ref, which would require persisting and replaying graph state. Per-file rules already cover 8 of 11 biomarker types, which is the dominant signal for an edit-loop self-report. Surfaces -------- - src/compare/index.ts — new FindingDelta / FindingsDelta exports, evaluateFileFindings + computeFindingsDelta helpers, findingsDelta flag on CompareOptions, optional findingsDelta field on FileDelta. - src/mcp/tools/compare.ts — new findingsDelta arg + render sections ("+ findings introduced", "− findings cleared", "carried over: N"). - src/bin/codegraph.ts — --findings-delta flag on compare-to-ref CLI. - src/biomarkers/index.ts — exports ANALYSABLE_KINDS and ANALYSABLE_MIN_LOC (the in-memory analysis needs the same gate as the indexed pass). Tests (4 new in __tests__/compare.test.ts) - regression case: gnarly edit on alpha → added findings on alpha - cleanup case: gnarly baseline + clean current → cleared findings - carry case: gnarly baseline + slightly-tweaked-but-still-gnarly current → carried > 0, added/cleared = 0 - no-change case: empty findings delta when nothing differs from REF Suite: 1370/13/0 (was 1366; +4 new). tsc clean. Reviewer pass ------------- APPROVE with two info-level fixes addressed before commit: - carried bucket was populated but never rendered → added a collapsed "N pre-existing findings carried over" line. - carried path was untested → added the tweaked-but-gnarly case. Dogfooded on this very PR; the report correctly surfaced findings my own changes introduced, including a long_parameter_list warning on computeFindingsDelta that I bundled into a FileSnapshot interface before commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olbymchenry#8) Promotes the resolver numeric `metadata.confidence` to a categorical, queryable column on `edges`. `codegraph_callers`, `_callees`, `_impact` now annotate each row with `*(INFERRED)*` when the edge is not concrete, and accept a new `minConfidence` arg to drop edges below a chosen level — useful for refactor-impact analysis where false-positive reach inflates the blast radius. Migration 032 adds `edges.confidence TEXT` (nullable; read-cast in EDGE_SCHEMA collapses NULL → `EXTRACTED`). Nullable lets legacy / structural / extractor-direct edges need no producer change. Index on the column for filter-down queries. Idempotent ADD COLUMN with table-exists guard following the convention from migrations 020 / 021 / 023 / 030. Producer: `ReferenceResolver.createEdges` calls a new free `classifyConfidence(resolvedBy, score)` helper: - import / qualified-name / file-path / exact-match → EXTRACTED - framework with score >= 0.85 → EXTRACTED, else INFERRED - instance-method / fuzzy → INFERRED AMBIGUOUS is not stamped from the resolver path in v1 — needs tie-tracking from the candidate picker. Tracked as B3. Consumers (`result-formatters.ts`): - `formatConfidence(edge)` — markdown italic suffix (empty for EXTRACTED; `*(INFERRED)*` / `*(AMBIGUOUS)*` otherwise). - `parseMinConfidence(raw)` — returns level / null / errorResult. - `filterByConfidence(rows, min)` — drop below threshold. - `CONFIDENCE_RANK` — numeric ordering for inline filter loops. Wired into all four code paths in `callers.ts` (collectCallers, collectCallersForSource, collectTypeUsers, formatGroupedCallers), both paths in `callees.ts`, and `impact.ts` with a `filterImpactByConfidence` BFS that prunes nodes unreachable along the surviving edge set. Type cleanup: `Edge.confidence` added with JSDoc; the long-dead `Edge.provenance` field also got a JSDoc note pointing at the SCIP-integration-reservation explanation in memory. Reviewer-memo #4 catch (schema-version test forgetfulness): noticed `__tests__/pr19-improvements.test.ts` was still asserting `toBe(29)` even though migrations 030 and 031 had shipped. Catch-up to 32 in this commit covers both the missed bumps and the new migration. Reviewer: APPROVE, three info-only findings (type-user display gap, framework-rule calibration table doc, catch of the recurring pr19 pattern). Suite: 1555 / 34 / 0 (+18 new edge-confidence tests). Eval: 11/11 passed | recall=1.00 | mrr=0.86 (within regression budget vs 091e935 baseline). Phase 3 done. Backlog now down to: #6 TOON (deferred), colbymchenry#11 trace_to_culprits, colbymchenry#17/colbymchenry#18 Phase 7, colbymchenry#19 streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sites (colbymchenry#11) Last Phase 1-6 backlog item. Replaces the "agent stitches codegraph_callers + _history + _biomarkers manually for every stack frame" loop with one tool: paste a stack trace, get a ranked list of likely fix-site candidates with per-row "why suspected" reasons. ## Pipeline 1. `parseTrace(text)` — multi-format extractor: V8/Node, Python, Java/Kotlin/Scala, Go, Rust, plus a generic `path/to/file.ext:line` fallback. Caps at 200 frames; dedupes by `(file, line)`; normalises Windows backslashes; first-pattern-per-line wins. 2. `resolveTraceFile` — longest-suffix match between absolute trace paths and project-relative indexed paths, with a path-boundary check so `foo/bar.ts` doesn't match `sub-foo/bar.ts`. 3. `findEnclosingNode` (existing helper from index-hooks/enclosing.ts, shared with codegraph_grep) attributes each frame to its enclosing function/method/class via line number. 4. `scoreCulprits` composite: `1 / (topRank + 1)` for frame position (top of stack wins decisively), `+0.3` for risk-biomarker overlap (god_class / complex_method / large_method / nested_complexity), `+0.2` for recent file churn (touched within 30 days, with commit count surfaced in the reason). 5. Output: ranked candidates with per-row reasons + an "unmapped frames" footer (capped at 5 samples) so the agent sees what couldn't be attributed (`node_modules/`, generated files, etc.). ## CLI mirror `codegraph trace-to-culprits` reads from stdin (`cat error.log | codegraph trace-to-culprits`) or from `--trace "..."`. Routes via runViaMCP per the repo convention. ## Reviewer cycle Round 1 returned BLOCK + REQUEST_CHANGES + info — all addressed: - **R1 (BLOCK, correctness)**: `recentCutoff` was computed in ms but `last_touched_ts` is unix seconds — the `>=` comparison always evaluated false, killing the churn signal silently on every project. Fixed with `Math.floor(Date.now() / 1000)` and a new R1-regression-guard test that stamps a fresh `last_touched_ts` via the same `applyChurnDeltas` helper the churn miner uses, then asserts the rendered output mentions "recent churn". - **R2 (REQUEST_CHANGES, correctness)**: inline `cg.queries.db .prepare(...)` violated the per-domain query-file convention and re-prepared the same statement per culprit. Switched to `getFileByPath` from queries-files.ts (cached statement, returns FileRecord with `lastTouchedTs` and `commitCount`). - **R3 (info, perf)**: `resolveTraceFile` ran `getAllFiles` once per frame — N full-table scans for an N-frame trace. Hoisted to once-per-call at the top of `buildCulprits`. ## Tests 13 trace tests in `__tests__/mcp-trace-to-culprits.test.ts`: - parseTrace: V8 / Python / Java / Go / dedup / 200-frame cap / no-frames / Windows backslash normalisation - End-to-end: V8 trace → enclosing symbols ranked top-of-stack first, unmapped-frames footer, no-refs error message, limit cap, R1-regression guard for the churn signal CLI dogfood: `cat trace | npm run cli:dev -- trace-to-culprits` parses a real production-shaped V8 trace and maps frames to `src/mcp/tools/trace-to-culprits.ts` symbols. Suite: **1595 / 34 / 0** (+13 trace tests). Eval: 18/18 | recall=1.00 | mrr=0.91 | within budget (re-baselined after colbymchenry#11 added new files to the corpus, which shifted the explore-pipeline case rank — case still PASSes its 0.5 threshold, the relative regression was corpus drift, not behaviour change). ## Backlog after this Phase 1-6 complete. Remaining: Phase 7 (colbymchenry#17 propose_extract / propose_rename, colbymchenry#18 plan-and-execute CLI — both need user check), Phase 8 (colbymchenry#19 streaming MCP), and #6 TOON (deferred pending measurement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Backlog #6 proposed adopting TOON (Token-Oriented Object Notation, header-once / rows-as-tuples) as a smaller alternative to the current markdown formatters for tabular MCP responses. The backlog explicitly gated the decision: "Verify actual savings on captured queries before flipping default — the 30-60% claim is for ideal tabular data." This commit ships the measurement and the answer. ## Result Eight representative queries against this project's own .codegraph, covering search (long signatures), suggest (short rows), callers (20-row mixed-confidence list): sample rows md(B) toon(B) saving ────────────────────────────────────── ──── ───── ─────── ────── search "CodeGraph" 10 683 661 +3.2% search "extractFromSource" 10 1483 1444 +2.6% search "compareToRef" 3 461 468 -1.5% search "handleSearch" 6 760 757 +0.4% search "parseTrace" 2 291 298 -2.4% suggest "CodGrap" 10 408 409 -0.2% suggest "extracFromSorce" 10 386 387 -0.3% callers of extractFromSource 14 936 1037 -10.8% TOTAL: md=5408B toon=5461B aggregate saving -1.0% ## Why the 30-60% claim doesn't apply TOON's win comes from compressing a verbose JSON baseline like `[{"name":"foo","kind":"function","file":"a.ts","line":42},…]` — the "headers repeated per row" + JSON quoting waste is what its header-once shape removes. But our markdown is ALREADY row-shaped: - `### foo (function)` ≈ `foo,function` (no quoting). - `a.ts:42` ≈ `a.ts,42` (single-char delimiter, both compact). - `- name (kind) - file:line` is shorter than the comma-tuple form when names are short. There's no fat to trim. On callers (the densest row shape) TOON is 10% LARGER because the per-row `- ` bullet syntax is a one-byte overhead while TOON's commas waste two chars between every field. ## Decision Skip TOON. Empirical answer matches the backlog's gate — the savings aren't there for our markdown baseline. Adopting it would: - Add per-tool format selector code + tests. - Risk LLM-client misrender (TOON is new; some clients haven't trained on it). - Net zero to net negative on payload bytes. The measurement script (`__tests__/evaluation/toon-measure.ts`) stays in the repo as a permanent record + a re-runnable harness if the markdown formatters ever get verbose enough to flip the math. Run with `npx tsx __tests__/evaluation/toon-measure.ts`. ## Backlog #6 closed. Phase 1-6 of the agentic backlog is now fully resolved (every item shipped or empirically dismissed). Remaining: Phase 7 (colbymchenry#17 propose_extract / propose_rename, colbymchenry#18 plan-and-execute CLI), Phase 8 (colbymchenry#19 streaming MCP) — all need user check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…search Subtask #6 of the embedding-features arc Stage 3. New module src/llm/query-expansion.ts that asks the configured chat backend for paraphrases of a search query and caches the result by sha256(n + ':' + text). Wired into searchHybridBlendWithEmbeddings: when the top semantic-hit cosine score < 0.5 AND a chat backend is reachable, the function expands the query into 3 paraphrases, embeds each, runs them through llmSemanticTopK, and max-pools the candidate scores across the original + expansions. Anti-goals enforced: - Returns [] on LLM error / abort / unreachable backend so callers fall back to the unexpanded path without paying a query stall. - LRU-ish cache bounded at 256 entries — bumps recency on read, evicts oldest on write at capacity. - Filters out paraphrases identical to the original (de-dup) and lines >= 500 chars (LLM hallucination guard). Threshold + paraphrase count live as module-private constants (QUERY_EXPANSION_SCORE_THRESHOLD = 0.5, _PARAPHRASE_COUNT = 3). Suite: 2136 -> 2163 (+27 tests covering input validation, LLM call shape, output parsing, error handling, caching, duplicate filter). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `compact: true` to codegraph_callers + codegraph_callees inputSchema. When set, the formatNodeList renderer emits one terse line per node: Callers of foo (3) bar|function|src/x.ts:42|id:n_a1b2c3d4 baz|method|src/y.ts:120|id:n_e5f6g7h8 instead of the default markdown: ## Callers of foo (3 found) - bar (function) - src/x.ts:42 *(EXTRACTED)* (2 call sites: 12, 45) `[id: n_a1b2c3d4]` Saves ~50-70% of output tokens on graph-walk chains. Default false preserves the rich human-readable output for ad-hoc inspection. Pattern is straightforward to extend to the other navigation tools (search, refs, impact). Tracked as Stage 6 #6.1 follow-up. Suite remains 2374/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New MCP tool consolidating 5-10 chained codegraph_callers/callees calls into one BFS traversal. Inputs: - start: symbol name OR n_xxxxxxxx UID - direction: 'callers' | 'callees' | 'impact' (bidirectional) - hops: 1-5, default 2 - maxNodes: 1-200, default 50 - edgeKind?: filter to one edge kind - compact?: default TRUE (this tool's purpose IS terse output) - projectPath?: standard BFS deduplicates via node_id Map, first-seen wins (shorter depth preferred). Excluded edge kinds: similar_to / def_use / contains (structural noise) on unfiltered walks. Output rows: `name|kind|path:line|depth=N|via=parent_name|id:n_xxxxxxxx`. 26 tests in __tests__/mcp-walk.test.ts covering input validation, BFS shape (hops/cycles/dedup/maxNodes cap), output format (compact vs markdown), and registration. Tool registered. Suite passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#6.2 + #6.3) Closes the inline Stage 6 deferrals. #6.2 since-tokens — codegraph_callers and codegraph_callees now support `since: 'c_xxxxxxxx'` delta-mode chains. Same pattern as codegraph_search / codegraph_grep / codegraph_explore (the applyDeltaSince + mintCallId + appendCallIdFooter helpers from shared.ts). Saves ~40-60% of output on multi-step walks where the agent already saw most of the prior result. #6.3 field projection — both tools accept a `fields: string[]` arg that restricts compact-mode rows to a subset of {name, kind, path, line, id}. Combined with `compact: true`, drops to the absolute minimum the caller needs: fields: ["name", "path"] → `findSimilarViaVec|src/db/vec-helpers.ts` Saves another ~50% on top of compact. Both flags are opt-in; default behavior unchanged. Suite remains 2433/0/34. Typecheck clean. Stage 6 token-cost reductions now complete: #6.1 compact mode + #6.2 since-tokens + #6.3 field projection + #6.4 codegraph_walk all shipped. Cumulative effect on graph-heavy sessions: 60-80% output-token reduction when all flags are set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reviewer pass on c895348..5730d98 caught 3 request_changes + 3 info, all addressed. Plus the deferred staleness gating (info #6) and playbook updates the user requested as a follow-up. ## Request_changes (real bugs) 1. **Chunk vec0 not cleared on full reset** (queries.ts + vec-helpers.ts): `clearAll` deletes nodes with FKs OFF, so the ON DELETE CASCADE on `symbol_chunk_embeddings` never fires. `clearVecTables` then only wiped `vec_symbol_embeddings_*` — leaving the chunk vec0 mirror tables full of ghost rowids that would poison the next index run's `findSimilarViaVec` chunk queries. Fixed: explicit `DELETE FROM symbol_chunk_embeddings` inside `clearAll`'s transaction; `clearVecTables` now also discovers and wipes `vec_chunk_embeddings_*` via the same LIKE pattern + regex whitelist. 2. **Docstring rot on upsertChunkEmbedding** (queries-chunk-embeddings.ts): JSDoc claimed "false when an existing row was updated" but `ON CONFLICT DO UPDATE` produces `changes=1` on both INSERT and UPDATE — the function returned true on both. Tightened JSDoc to match actual behavior; simplified the return condition to `result.changes > 0` (drops the meaningless lastInsertRowid check). 3. **codegraph_walk missing CLI mirror** (bin/codegraph.ts): the new MCP tool had no `program.command('walk')` counterpart, violating the repo's CLI/MCP alignment convention. Added a thin CLI command that delegates via runViaMCP — same shape as the other delegating subcommands. ## Info-level 4. **Deferred staleness gating shipped** (multi-vec.ts): previously re-embedded every long symbol's chunks on every sync. Now stamps the file's content_hash on each chunk row (in the previously- unused summary_hash_at_embed column, repurposed for multi-vec) and discoverLongSymbols' SELECT skips symbols whose stored hash matches the current files.content_hash. Idempotent re-syncs with no source drift now do zero embed work. 5. **parseFieldsArg duplication** (shared.ts + callers/callees): Extracted to `shared.ts` as `parseFieldsArg` + `CompactFieldName` type. Callers and callees both import the shared version. 6. **Playbook updates** (server-instructions.ts): - Added `codegraph_walk` use-case bullet under graph-walk tools - New "Token-cost flags" section documenting compact / fields / since on callers + callees - Updated `codegraph_risk_review` reference to the merged `codegraph_review({mode: 'risk'})` form Skipped (genuine non-issues): the `since` delta filter on callers not covering typeUsers (typeUsers is a small distinct section by design, not part of the same-rank dedup); CRLF handling in sliceNodeBody (would require explicit \\r\\n splitting; deferred until Windows operators report regressions). Suite remains 2433/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olved-by-FTS Three more friction closures from the 2026-05-11 workout: - src/extraction/index.ts + src/db/queries-files.ts: heal-marked files now propagate through the git fast-path. The EXTRACTION_LOGIC_VERSION self-heal sets needs_reextract=1 on every file, but the git fast-path eoCollectGitChanges only iterated files git reported as changed — so heal-marked files with no diff were silently skipped in any git-tracked project. Adds a getFilesNeedingReextract(qb) query and unions the result into the change set after the git fast-path returns. Full-scan path was already correct via eoClassifyFileChange; no change there. Closes friction colbymchenry#22 (surfaced while validating the cluster fix in 6957b74 — the original needs_reextract=1 → 620 of 627 stayed pending after sync, requiring admin index --force). - src/mcp/tools/entry-points.ts: new "MCP tools" bucket scans for exported `*_TOOL` constants whose initializer signature contains `name: 'codegraph_*'` and surfaces them with the parsed canonical tool name. Closes friction colbymchenry#13 / cluster symptom #6: every MCP tool dispatcher previously fell through every bucket because routes/CLI commands/CLI-files/public-exports each had a different filter mismatch. Five buckets now (HTTP routes, CLI commands, MCP tools, CLI files, public exports). __tests__/mcp-entry-points grows 5 → 6 tests with the new bucket fixture. - Friction colbymchenry#9 (semantic/intent search can't bridge tool-name → handler-name): resolved by existing FTS5 path. Investigation showed nodes_fts indexes the `signature` column at bm25=2, so searching `codegraph_X` already returns the matching XXX_TOOL constant via signature substring match. The morning-workout "No results" observation was likely a stale-index transient (pre-cluster-fix). Added a regression test in mcp-search-family to lock in the behavior. Test impact: 2633 → 2635 (one new MCP-tools-bucket test, one new canonical-tool-name regression test). Type-check clean. Closes: - colbymchenry#22 EXTRACTION_LOGIC_VERSION heal flag honored - colbymchenry#13 entry_points 5th bucket for MCP tool dispatchers - colbymchenry#9 already-resolved-by-FTS — locked in with regression test Still open: - colbymchenry#8 / cluster #1 (context whiff): graph edge now exists, but the context-builder ranker still surfaces ToolHandler/ToolResult/ ToolCtx instead of dispatchers. Needs a context-ranker boost for `codegraph_*` queries. Logged as colbymchenry#24 follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

andreinknv and others added 2 commits April 30, 2026 19:52

andreinknv closed this May 1, 2026

andreinknv deleted the pr4-gc-wasm-fixes branch May 1, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: GC orphan summaries + WASM savepoint nesting [WIP 4/5]#6

fix: GC orphan summaries + WASM savepoint nesting [WIP 4/5]#6
andreinknv wants to merge 2 commits into
pr3-split-provider-and-summary-batchingfrom
pr4-gc-wasm-fixes

andreinknv commented May 1, 2026

Uh oh!

andreinknv commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andreinknv commented May 1, 2026

Commits

Summary

Symbol-summary GC (1e5fa19)

Directory-summary GC (9adac63)

WASM savepoint nesting (9adac63)

Test plan

Uh oh!

andreinknv commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Symbol-summary GC (`1e5fa19`)

Directory-summary GC (`9adac63`)

WASM savepoint nesting (`9adac63`)